Gene structure prediction using an orthologous gene of known exon-intron structure.

نویسندگان

  • Stephanie Seneff
  • Chao Wang
  • Christopher B Burge
چکیده

Given the availability of complete genome sequences from related organisms, sequence conservation can provide important clues for predicting gene structure. In particular, one should be able to leverage information about known genes in one species to help determine the structures of related genes in another. Such an approach is appealing in that high-quality gene prediction can be achieved for newly sequenced species, such as mouse and puffer fish, using the extensive knowledge that has been accumulated about human genes. This article reports a novel approach to predicting the exon-intron structures of mouse genes by incorporating constraints from orthologous human genes using techniques that have previously been exploited in speech and natural language processing applications. The approach uses a context-free grammar to parse a training corpus of annotated human genes. A statistical training procedure produces a weighted recursive transition network (RTN) intended to capture the general features of a mammalian gene. This RTN is expanded into a finite state transducer (FST) and composed with an FST capturing the specific features of the human orthologue. This model includes a trigram language model on the amino acid sequence as well as exon length constraints. A final stage uses the free software package ClustalW to align the top n candidates in the search space. For a set of 98 orthologous human-mouse pairs, we achieved 96% sensitivity and 97% specificity at the exon level on the mouse genes, given only knowledge gleaned from the annotated human genome.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of Novel Mutations in IL-2 Gene in Khorasan Native Fowls

The intron-exon structure of Khorasan native fowl interleukin-2 (IL-2) was investigated. For this purpose, twenty chickens were selected from the Native Fowl Breeding Station of Khorasan province, and genomic DNA was extracted using a modified conventional DNA extraction protocol. An 875 bp fragment of IL-2 was successfully amplified, including a small part of the promoter, exon 1, intron 1, an...

متن کامل

O-36: Evaluation of Genetic Variations in Intron 4 and Exon 5 of RABL2B Gene in Infertile Men with Oligoasthenoteratospermia and Immotile Short Tail Sperm Defects

Background One of the main causes of male infertility is defect in structure and function of sperm cells. Infertile men with oligoasthenoteratospermia (OAT) defect, have sperms with abnormalities in count, motility and morphology. Patients with immotile short tail sperm (ISTS) disorder have immotile short-tailed sperm with disorganized axonem, and a significant decrease in sperm counts. Numerou...

متن کامل

Genotyping of Intron 22 and Intron 1 Inversions of Factor VIII Gene Using an Inverse-Shifting PCR Method in an Iranian Family with Severe Haemophilia A

Abstract Background: Haemophilia A (HA) is an X-linked bleeding disorder caused by the absence or reduced activity of coagulation factor VIII (FVIII). Coagulation factors are a group of related proteins that are essential for the formation of blood clots. The aim of this study was to genotype the coagulation factor VIII gene mutations using Inverse Shifting PCR (IS-PCR) in an Iranian family ...

متن کامل

Novel Single Nucleotide Polymorphisms (SNPs) in Intron 2 and Exon 3 Regions of Leptin Gene in Sumba Ongole Cattle

The bovine leptin (LEP) gene was widely used as a candidate gene for molecular selection to improve productivity traits of cattle. This study was carried out to identify single nucleotide polymorphisms (SNPs) in the LEP gene of Sumba Ongole (SO, Bos indicus) cows using sequencing method. A total of 31 animals were used in this study for analyses. Research showed that total of 16 SNPs w...

متن کامل

Characterization of Calpastatin Gene in Iranian Afshari Sheep

Calpastatin is an endogenous inhibitor of calpain (calcium-dependent cysteine protease). Calpastatin activityis highly related to the rate of protein turnover and rate of meat tenderization. In order to characterize thestructure of calpastatin in Iranian Afshari breed of sheep, intron 6 and partial exon 7 of the L domain wereamplified and sequenced. A fragment of approximately...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Applied bioinformatics

دوره 3 2-3  شماره 

صفحات  -

تاریخ انتشار 2004